Human conversation is inherently complex, often spanning many differenttopics/domains. This makes policy learning for dialogue systems verychallenging. Standard flat reinforcement learning methods do not provide anefficient framework for modelling such dialogues. In this paper, we focus onthe under-explored problem of multi-domain dialogue management. First, wepropose a new method for hierarchical reinforcement learning using the optionframework. Next, we show that the proposed architecture learns faster andarrives at a better policy than the existing flat ones do. Moreover, we showhow pretrained policies can be adapted to more complex systems with anadditional set of new actions. In doing that, we show that our approach has thepotential to facilitate policy optimisation for more sophisticated multi-domaindialogue systems.
展开▼